191 research outputs found

    Time Series Data Mining Algorithms for Identifying Short RNA in Arabidopsis thaliana

    Get PDF
    The class of molecules called short RNAs (sRNAs) are known to play a key role in gene regulation. Th are typically sequences of nucleotides between 21-25 nucleotides in length. They are known to play a key role in gene regulation. The identification, clustering and classification of sRNA has recently become the focus of much research activity. The basic problem involves detecting regions of interest on the chromosome where the pattern of candidate matches is somehow unusual. Currently, there are no published algorithms for detecting regions of interest, and the unpublished methods that we are aware of involve bespoke rule based systems designed for a specific organism. Work in this very new field has understandably focused on the outcomes rather than the methods used to obtain the results. In this paper we propose two generic approaches that place the specific biological problem in the wider context of time series data mining problems. Both methods are based on treating the occurrences on a chromosome, or “hit count” data, as a time series, then running a sliding window along a chromosome and measuring unusualness. This formulation means we can treat finding unusual areas of candidate RNA activity as a variety of time series anomaly detection problem. The first set of approaches is model based. We specify a null hypothesis distribution for not being a sRNA, then estimate the p-values along the chromosome. The second approach is instance based. We identify some typical shapes from known sRNA, then use dynamic time warping and fourier trans-form based distance to measure how closely the candidate series matches. We demonstrate that these methods can find known sRNA on Arabidopsis thaliana chromosomes and illustrate the benefits of the added information provided by these algorithms

    Computational classification of small RNAs and their targets

    Get PDF
    Small RNAs, and in particular microRNAs, are currently receiving a great deal of attention due to their important roles in gene regulation and organism development. Recently, new high-throughput technologies have made it possible to sequence hundreds of thousands of small RNAs from a single experimental sample. In this thesis we develop new computational tools to process such high-throughput small RNA datasets in order to identify microRNAs and other biologically interesting small RNA candidates and to predict their target genes. We apply these tools to a variety of plant and animal datasets and present some novel discoveries including miRNAs involved in fruit development in tomato (Solanum lycopersicon).EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Metatranscriptomes from diverse microbial communities: assessment of data reduction techniques for rigorous annotation

    Get PDF
    Background Metatranscriptome sequence data can contain highly redundant sequences from diverse populations of microbes and so data reduction techniques are often applied before taxonomic and functional annotation. For metagenomic data, it has been observed that the variable coverage and presence of closely related organisms can lead to fragmented assemblies containing chimeric contigs that may reduce the accuracy of downstream analyses and some advocate the use of alternate data reduction techniques. However, it is unclear how such data reduction techniques impact the annotation of metatranscriptome data and thus affect the interpretation of the results. Results To investigate the effect of such techniques on the annotation of metatranscriptome data we assess two commonly employed methods: clustering and de-novo assembly. To do this, we also developed an approach to simulate 454 and Illumina metatranscriptome data sets with varying degrees of taxonomic diversity. For the Illumina simulations, we found that a two-step approach of assembly followed by clustering of contigs and unassembled sequences produced the most accurate reflection of the real protein domain content of the sample. For the 454 simulations, the combined annotation of contigs and unassembled reads produced the most accurate protein domain annotations. Conclusions Based on these data we recommend that assembly be attempted, and that unassembled reads be included in the final annotation for metatranscriptome data, even from highly diverse environments as the resulting annotations should lead to a more accurate reflection of the transcriptional behaviour of the microbial population under investigation

    FilTar: Using RNA-Seq data to improve microRNA target prediction accuracy in animals

    Get PDF
    MOTIVATION: MicroRNA (miRNA) target prediction algorithms do not generally consider biological context and therefore generic target prediction based on seed binding can lead to a high level of false-positive predictions. Here, we present FilTar, a method that incorporates RNA-Seq data to make miRNA target prediction specific to a given cell type or tissue of interest. RESULTS: We demonstrate that FilTar can be used to: (i) provide sample specific 3'-UTR reannotation; extending or truncating default annotations based on RNA-Seq read evidence and (ii) filter putative miRNA target predictions by transcript expression level, thus removing putative interactions where the target transcript is not expressed in the tissue or cell line of interest. We test the method on a variety of miRNA transfection datasets and demonstrate increased accuracy versus generic miRNA target prediction methods. AVAILABILITY AND IMPLEMENTATION: FilTar is freely available and can be downloaded from https://github.com/TBradley27/FilTar. The tool is implemented using the Python and R programming languages, and is supported on GNU/Linux operating systems. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Global discovery and characterization of small non-coding RNAs in marine microalgae

    Get PDF
    Background Marine phytoplankton are responsible for 50% of the CO2 that is fixed annually worldwide and contribute massively to other biogeochemical cycles in the oceans. Diatoms and coccolithophores play a significant role as the base of the marine food web and they sequester carbon due to their ability to form blooms and to biomineralise. To discover the presence and regulation of short non-coding RNAs (sRNAs) in these two important phytoplankton groups, we sequenced short RNA transcriptomes of two diatom species (Thalassiosira pseudonana, Fragilariopsis cylindrus) and validated them by Northern blots along with the coccolithophore Emiliania huxleyi. Results Despite an exhaustive search, we did not find canonical miRNAs in diatoms. The most prominent classes of sRNAs in diatoms were repeat-associated sRNAs and tRNA-derived sRNAs. The latter were also present in E. huxleyi. tRNA-derived sRNAs in diatoms were induced under important environmental stress conditions (iron and silicate limitation, oxidative stress, alkaline pH), and they were very abundant especially in the polar diatom F. cylindrus (20.7% of all sRNAs) even under optimal growth conditions. Conclusions This study provides first experimental evidence for the existence of short non-coding RNAs in marine microalgae. Our data suggest that canonical miRNAs are absent from diatoms. However, the group of tRNA-derived sRNAs seems to be very prominent in diatoms and coccolithophores and maybe used for acclimation to environmental conditions

    Rfam: annotating non-coding RNAs in complete genomes

    Get PDF
    Rfam is a comprehensive collection of non-coding RNA (ncRNA) families, represented by multiple sequence alignments and profile stochastic context-free grammars. Rfam aims to facilitate the identification and classification of new members of known sequence families, and distributes annotation of ncRNAs in over 200 complete genome sequences. The data provide the first glimpses of conservation of multiple ncRNA families across a wide taxonomic range. A small number of large families are essential in all three kingdoms of life, with large numbers of smaller families specific to certain taxa. Recent improvements in the database are discussed, together with challenges for the future. Rfam is available on the Web at http://www.sanger.ac.uk/Software/Rfam/ and http://rfam.wustl.edu/

    miR-133-mediated regulation of the Hedgehog pathway orchestrates embryo myogenesis

    Get PDF
    Skeletal myogenesis serves as a paradigm to investigate the molecular mechanisms underlying exquisitely regulated cell fate decisions in developing embryos. The evolutionary conserved miR-133 family of microRNAs is expressed in the myogenic lineage, but how it acts remains incompletely understood. Here we performed genome-wide differential transcriptomics of miR-133 knock-down (KD) embryonic somites, the source of vertebrate skeletal muscle. This revealed extensive downregulation of Sonic hedgehog (Shh) pathway components: patched receptors, Hedgehog interacting protein, and the transcriptional activator, Gli1. By contrast Gli3, a transcriptional repressor, was de-repressed and confirmed as a direct miR-133 target. Phenotypically, miR-133 KD impaired myotome formation and growth by disrupting proliferation, extracellular matrix deposition and epithelialization. Together this suggests that miR-133 mediated Gli3 silencing is critical for embryonic myogenesis. Consistent with this idea we found that activation of Shh signalling by either purmorphamine, or KD of Gli3 by antisense morpholino (MO) rescued the miR-133 KD phenotype. We identify a novel Shh/MRF/miR-133/Gli3 axis that connects epithelial morphogenesis with myogenic fate specification

    Suspended manufacture of biological structures

    Get PDF
    We present a novel method of extrusion-based ALM for the production of cell-laden strucutres from low viscosity polymers. The traditional planar print bed is replaced with a bed of micoparticulate fluid gel. During the extrusion process, the fluid gel is displaced whilst providing a support strucutre for the low viscosity material allowing manufacture of relatively complex geometries. The extruded structure can then be easily removed from this self-healing fluid bed. For this study, a bi-layered cell-seeded construct was produced to model the osteochondral junction. Osteochondral plugs were produced by the addition of chondrocytes and osteoblasts to 1.5%w/v gellan and 1.5%w/v gellan-5% nano-hydroxyapatite respectively. The consecutive extrusion of these two solutions into the fluid bed followed by further ionic crosslinking produced the bi-layered construct that was implant into a femoral condyle defect in vitro. Cell viability following extrusion was confirmed using calcein AM/PI live/dead staining showing excellent viability. Constructs were then sectioned, and qRT-PCR was performed, showing a native collagen phenotype across the construct with evidence of matrix markers in the cartilage-like region which were also identified using fluroescent-IHC. Constructs were also tested for their bulk relaxation properties. Addition of nano-hydroxyapatite in the bone-like region resulted in a faster, more elastic relaxation than gellan alone, something that has previously been reported to favour osteogenic differentiation. Please click Additional Files below to see the full abstract

    The UEA sRNA Workbench (version 4.4): a comprehensive suite of tools for analyzing miRNAs and sRNAs

    Get PDF
    Motivation: RNA interference, a highly conserved regulatory mechanism, is mediated via small RNAs (sRNA). Recent technical advances enabled the analysis of larger, complex datasets and the investigation of microRNAs and the less known small interfering RNAs. However, the size and intricacy of current data requires a comprehensive set of tools, able to discriminate the patterns from the low-level, noise-like, variation; numerous and varied suggestions from the community represent an invaluable source of ideas for future tools, the ability of the community to contribute to this software is essential. Results: We present a new version of the UEA sRNA Workbench, reconfigured to allow an easy insertion of new tools/workflows. In its released form, it comprises of a suite of tools in a user-friendly environment, with enhanced capabilities for a comprehensive processing of sRNA-seq data e.g. tools for an accurate prediction of sRNA loci (CoLIde) and miRNA loci (miRCat2), as well as workflows to guide the users through common steps such as quality checking of the input data, normalization of abundances or detection of differential expression represent the first step in sRNA-seq analyses

    miRCat2: Accurate prediction of plant and animal microRNAs from next-generation sequencing datasets

    Get PDF
    Motivation: MicroRNAs are a class of ∼21-22 nucleotide small RNAs which are excised from a stable hairpin-like secondary structure. They have important gene regulatory functions and are involved in many pathways including developmental timing, organogenesis and development in eukaryotes. There are several computational tools for miRNA detection from next-generation sequencing (NGS) datasets. However, many of these tools suffer from high false positive and false negative rates. Here we present a novel miRNA prediction algorithm, miRCat2. miRCat2 incorporates a new entropy-based approach to detect miRNA loci, which is designed to cope with the high sequencing depth of current NGS datasets. It has a user-friendly interface and produces graphical representations of the hairpin structure and plots depicting the alignment of sequences on the secondary structure. Results: We tested miRCat2 on a number of animal and plant datasets and present a comparative analysis with miRCat, miRDeep2, miRPlant and miReap. We also use mutants in the miRNA biogenesis pathway to evaluate the predictions of these tools. Results indicate that miRCat2 has an improved accuracy compared with other methods tested. Moreover, miRCat2 predicts several new miRNAs that are differentially expressed in wildtype versus mutants in the miRNA biogenesis pathway. Availability: miRCat2 is part of the UEA small RNA Workbench and is freely available from http://srnaworkbench.cmp.uea.ac.uk
    corecore